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The two-dimensional Hubbard model is studied using the 
variational quantum Monte Carlo technique with Gutzwiller- 
type variational wave functions. In addition to the simple one- 
site correlated Gutzwiller wave function, we use a form with 
correlation between electrons on nearest- and next-nearest- 
neighbor sites. We show that the stochastic gradient approxi- 
mation method suits very well for the optimization of the free 
parameters of the variational wave functions. 

PACS numbers: 71.10.Fd, 02.70.Ss, 02.60. Pn 



I. INTRODUCTION 

The one-band Hubbard Hamiltonian is the simplest 
lattice model for studying strongly correlated elec- 
trons 0| . It is widely studied, especially in its two- 
dimensional version, as a model for the high-T c super- 
conducting cuprates. Among the various methods used 
for studying the Hubbard model, the quantum Monte 
Carlo (QMC) methods have shown to be powerful tools, 
both at zero and finite temperatures. The studies of 
the Hubbard model at zero temperature include several 
QMC methods, namely, the variational (VMC) |]||, the 
fixed- node diffusion (DMC) (^Q, and the constrained 
path QMC (CPMC) From these zero tempera- 

ture methods, only CPMC does not depend crucially on 
the variational wave function. Actually, the simple free- 
electron-like wave functions are rather generally found 
to be better importance functions for CPMC than the 
unrestricted Hartree-Fock wave functions ||. On the 
other hand, CPMC is limited to smaller lattice sizes than 
VMC. It can, however, give accurate results for system 
sizes far beyond the limits of exact diagonalizations ||] 
or the subspace techniques such as stochastic diagonal- 
ization |)j . Also the finite temperature QMC simulations 
can be used to obtain ground state properties if the tem- 
perature used is low enough fljof . This approach is, how- 
ever, mainly applicable for half-filled systems due to the 
problems related to the negative weights of the configu- 
rations, i.e., the fcrmion sign problem. 

Compared to the other approaches, the VMC meth- 
ods are very potent for studying larger system sizes. The 
computational cost of the VMC method is roughly an 
order of magnitude smaller than of the more sophisti- 
cated QMC methods. The VMC method is basically free 
from any methodological problems such as the fermion 
sign problem. There is, however, one very serious lim- 
itation in the VMC approach. This is related to the 



ultimate connection of the method to the many-body 
wave function itself. The form of the wave function 
is an input to the VMC simulations, and choosing the 
form requires human creativity and insight into the prob- 
lem. Due to this, VMC can hardly be called a reliable 
"black-box" simulation method. However, the recent ad- 
vancements in the optimization of the variational wave 
functions jll],[l2| combined with the enormous increase 
in computing power indicate a possibility of having con- 
siderably more flexible VMC wave functions and making 
the VMC simulations less "human-biased" . 

The crucial part of the VMC simulation is to locate 
the optimal values of the free parameters in the many- 
body wave function. An efficient optimization method 
allows one to have more variational parameters and thus 
a more flexible form of the wave function and saves com- 
puter time. In this paper, we show that the optimiza- 
tion method called the stochastic gradient approximation 
(SGA) jn| suits very well for the VMC also in the case 
of lattice Hamiltonians. We have previously used it for 
continuum models, mainly quantum dots |l3| . 

The goal in the VMC optimization is to minimize the 
cost function 
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where configuration Rj contains the coordinates of the 
simulated particles at jth step of a random sequence, 
ex. = (ctij ■ • ■ , a n ) represents the vector of n parameters 
to be optimized, and Q is the "local" version of the cost 
function of a configuration R and a. One can formu- 
late the VMC optimization problem approximately as a 
"standard" optimization problem by taking a finite but 
large k in the cost function above. One can then use 
the resulting approximation of the function as a deter- 
ministic function during the optimization. Doing this, 
one approximates the whole distribution of Rs by a fi- 
nite set, and more importantly, one forgets the a depen- 
dence of R. To some extent this error can be corrected 
by weighting the configurations by factors that depend 
on the change of the probability of the configuration in 
question. 

In the SGA optimization method such truncation is 
not done. In SGA, the optimal parameter vector a.*, 
defined so that 
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is found by changing the parameters a. to the direction 
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of the unbiased stochastic approximation of the nega- 
tive gradient, i.e., -VaQ. The approximate gradient 
Va Q(R; a*) is not zero even for the optimal parameters, 
but the average of it over configurations R vanishes. Due 
to this, the stochastic simulation should include damping 
so that it actually converges. Without damping, the sim- 
ulation would end up in oscillating around the optimal 
parameters. The damping should, however, be so slow 
that the simulation is able to reach the optimal param- 
eters. A more detailed formulation of the SGA method 
for VMC is given in Sec. 0. 



II. METHOD 

We use the two-dimensional Hubbard model given by 
the Hamiltonian 

where (...) stands for nearest neighbors, cL (c^) creates 
(destroys) an electron with spin a at site i of the square 



lattice, and ru, 



The electrons at the same 



site interact with strength U > 0, and t is the hopping 
parameter. We use t as the energy scale, and set t = 1. 
The periodic boundary conditions are used for both space 
directions. This choice of boundary conditions leads to 
non-interacting single particle states that are plane- waves 
exp(ik • r) with energy = —2(cos(k x ) + cos(k y j). 

A commonly used variational wave function for the 
Hubbard model is the Gutzwiller wave function Il3 



(4) 



where D = J^. n^n^ is the number of doubly occupied 
sites, g is the only variational parameter, and is the 
many-body wave function for the non-interacting ground 
state. In this case, it is made of the plane-waves dis- 
cussed above, with the k values chosen to minimize the 
energy. The motivation for this wave function is that the 
Gutzwiller factor reduces the probability to find electrons 
on the same site and reduces the average interaction en- 
ergy. This is done, of course, with a cost of higher kinetic 
energy. It is also possible to correlate electrons that are 
not on the same site by constructing a Gutzwiller-type 
wave function 



(5) 



where the gis are parameters and the C;'s measure the 
number of nearest (and next nearest) neighbor electrons 
with various spin configurations, see Fig |l|. for more de- 
tails. 

We calculate the energy using the standard Metropolis 
algorithm (see, e.g., Refs. [Q for more details and Ref. ||] 
for a modified algorithm), giving the estimate for the 
energy as E — (-El), where the local energy is defined as 



E h (R) 
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and the average in the calculation of E is over configura- 
tions R distributed according to probability distribution 
oc ^(R)! 2 . The standard deviation of the local energy is 
given by 



a = y/((E L E) 2 ) . (7) 
One can also define a modified version of a, namely, 



y/{(EL - E T ) 2 ) = J a 2 + (E — E T ) 



(8) 



which gives the root-mean-square distance of the local 
energy from the target energy E^. The deviation a is 
particularly useful in the optimization of the wave func- 
tion. 

The SGA optimization method involves stochastic sim- 
ulation in two spaces: the configuration and the param- 
eter space. These spaces are coupled via the parameter 
vector. In the configuration space, a set of m configura- 
tions {Rj} is sampled from a distribution ^(a)! 2 , where 
a is the current parameter vector. When the parameters 
are changed, the set of configurations follow this change 
because the new sampling distribution depends on the 
new parameters. In practice, the set of configurations 
is found by the Metropolis algorithm. In the parameter 
space, the parameters at iteration i + 1 are obtained from 
the previous ones by the formula: 



-i = ol l - 7,V a S, 



(9) 



where ji is a scaling factor of the step length. The scal- 
ing factor has an important role in averaging out the 
fluctuations in the approximate gradient, ensuring the 
convergence. On the other hand, too small a value of 7 
would damp the simulation too much. These rules can 
be formulated mathematically as: 
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There is a simple interpretation for these conditions. The 
sum of 7 2 should be finite to dissipate the cumulative er- 
ror given by the noise in the approximate gradient and 
the sum of 7 should diverge, because otherwise the max- 
imum distance from the initial parameters would be lim- 
ited. If one uses a formula 7$ oc i' 13 , one should have 
h < (5 < 1. The choice of (3 — I, which is the maximally 
damped case, leads to a formula similar to the recursive 
calculation of a mean: 2, = X4-1 — i(xi-i — Xi). 

In Eq. (H), the approximate gradient is calculated using 
the set of m configurations. There are several possibilities 
for the cost function Q. For the energy minimization, the 
cost function is simply the mean of the local energies over 
the set of configurations: 
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and for the variance minimization one has Q = 
1/to i (El (R'f ) — {Ei)) 2 . In the variance minimiza- 
tion, one can also use a target energy Et instead of (El). 
This leads to the minimization of the function a above. 

In the original implementation of SGA as presented 
in Ref. jnj, the gradient in Eq. (|^) was the most diffi- 
cult task and was calculated by using a finite-difference 
formulation. In this case, one should note that the con- 
figurations are distributed according to | "1/ (ck) | 2 and not 
according to |^(a ± A)| 2 , where A represents a small 
change. Thus in the energy minimization, the finite- 
difference points are weighted as 



— ^2 WjE L [Kj ; a ± A] 



(12) 
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where the points Rj are distributed according to ^(c*) 



Y^, Wj , and Wj 



\y(R.j;a± A)/*(R,-;a)p. The 
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'weights' Wj of the local functions are very close to unity, 
because A is only a small change. 

It is, however, possible to calculate the gradient also 
analytically. In Ref. Lin et al. have shown that in 
the case of real wave function and energy minimization, 
the derivative of the energy E with respect to a varia- 
tional parameter en is simply 



dE 
da 
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where the average (...) is over the whole Metropolis sim- 
ulation |l2[ ] . One can implement this simple formula also 
for the SGA algorithm, with the small modification that 
the average is taken over only the current set of m config- 
urations. One should also note that the finite-difference 
formulation required calculation of the local energy with 
several parameters, whereas only a single evaluation is 
needed (for each configuration) if the analytic formula is 
used. 

In this work, we have used the both schemes discussed 
above for the calculation of the gradient. 



are at g « 0.6. The energy as a function of g is in a good 
agreement with Fig. 2. of Ref. |J. The predicted value 
of g resulting from the minimization of energy in Ref. Q| 
is slightly higher, around g ~ 0.566. The statistical error 
in the energies is too large in order to locate the optimal 
value of g with accuracy better than 0.01. 

Next, the SGA optimization method is used to locate 
the optimal values of g, with energy and a as the cost 
functions. In Fig. |^, the Gutzwiller parameter gi is plot- 
ted as a function of the SGA optimization step i for both 
energy and variance minimizations for the same system 
parameters as above. We have calculated the gradient 
using the original, finite-difference formulation. The sim- 
ulations converge to g 0.566 in the case of energy min- 
imization, and togK 0.603 in the optimization of a with 
the target energy Et = —280.5. Both parameter val- 
ues are in a good agreement with the independent sim- 
ulations presented above, and with the estimate of the 
optimal value of Ref. H . One can also see that around 
1000 steps are enough to estimate the optimal parameter 
values with a reasonable accuracy. The computational 
task of this is smaller than of one independent simulation 
presented in Fig. || for a single value of g. On the other 
hand, it seems that the accuracy of the optimal parame- 
ters found by SGA is one order of magnitude higher than 
in using the polynomial fit to the independent points. We 
have performed several simulations starting from differ- 
ent values of go- In every case, the performance of SGA 
is similar to those shown in Fig. ^. 

Fig. |J shows the first 200 values of the Gutzwiller pa- 
rameter gi for the simulation using the analytic deriva- 
tive of Eq. ( |l3| ) . This simulation converges also very ac- 
curately to the optimal value found above. The most 
important feature are that the fluctuations in the value 
of g during the optimization process is much smaller than 
in the simulation where the finite-difference formula was 
used. This leads to a faster convergence. 

To summarize, the performance of the SGA method 
has been found very satisfactory in finding the opti- 
mal parameter of the simple single-parameter Gutzwiller 
wave function. This is especially true in the case where 
the gradient is calculated analytically. 



B. Generalized Gutzwiller wave function 



III. RESULTS 

A. Single Gutzwiller parameter 

First, we will consider the wave function with correla- 
tion only between electrons on the same site, and thus 
only one free parameter g. In Fig. the energy and 
the deviations of the local energy a and a are plotted as 
a function of the Gutzwiller parameter g. The system 
consists of 101 + 101 electrons on a 16 x 16 lattice with 
U = 4. One can see that the minimum of the total en- 
ergy is located at g « 0.56, and the minima of a and a 



Next, we will consider the generalized Gutzwiller wave 
function, defined in Eq (^). We consider only the gener- 
alization with 4 parameters related to typical configura- 
tions shown in Fig [l]. The aim of these parameters is to 
capture partially the missing correlations. The general- 
ization made to the Gutzwiller wave function is bosonic 
in the sense that it does not change the nodal structure 
of the one-parameter Gutzwiller wave function. The en- 
ergy gain of this extra correlation could directly be com- 
pared to the energy of the fixed-node DMC simulations of 
Ref. p|] as the nodal structure used there is also given by 
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the simple Gutzwiller wave function. This DMC energy 
is lower than the corresponding VMC energy by ~ 2.5 
units. One should note that in the lattice formulation of 
DMC, unlike in the continuum formulation, the energy 
depends also on the bosonic correlation factor which does 
not change the nodes. This dependence is much smaller 
than the dependence of the VMC energy on g. There are 
also correlations that change the nodal structure, and the 
estimate for the importance of these can be estimated 
from the CPMC energy which is lower than the VMC 
energy by more than 6 units ||. 

We have optimized the 4 free parameters of the gen- 
eralized Gutzwiller wave function using the SGA and 
the analytic formulation of the gradient. The latter 
choice is due to better results in the one-parameter case. 
One should note that setting the parameter g\ = g and 
.92 = .93 = .94 = 1 one obtains the simple Gutzwiller wave 
function. We have again studied 101 + 101 electrons on 
a 16 x 16 lattice with U = 4 as above. 

The optimization of the energy leads to a reduced on- 
site correlation factor of g\ « 0.51. The other parameters 
are gi ~ 0.92 and <?3 ~ g± ~ 0.98. The smaller value of 
gi does not cost as much kinetic energy in this case as 
in the case of a simple Gutzwiller wave function, because 
the values of g^ and g^ are also smaller than one. It is 
also interesting to compare the optimal value of g\ found 
here with the optimal g s=s 0.52 of the simple Gutzwiller 
wave function determined from the g dependence of the 
DMC energy ||. The optimization converges again in 
few hundred steps, in similar fashion as shown in Fig. ^. 

The energy calculated with the optimal parameters 
is presented in the Table | with corresponding one- 
parameter VMC, DMC, and CPMC results. One can 
see that the generalization of the Gutzwiller wave func- 
tion is able to lower the energy by 1.1 units, which is 
less than half of the difference to the DMC energy. The 
CPMC energy is still around 5 units lower in energy. 

There are still important ingredients missing from the 
variational wave function. Possible extensions are three- 
and higher-particle correlation factors, modified single- 
particle states, and multiple-determinant wave functions. 
The good performance of the SGA method combined 
with the simple calculation of the energy gradient could 
be extremely useful in the studies exploring these direc- 
tions. 



useful in finding more accurate wave functions for the 
Hubbard model. 
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IV. CONCLUSIONS 
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We have shown that the SGA optimization method .. . . . . . 

finds the optimal values of the Gutzwiller wave function OgO O^)^ O | O O ^ ^ 

parameter in a reliable and efficient fashion. The com- 
putational cost of the optimization process is comparable 
to a single calculation of the expectation values with a ^ 
fixed parameter value. The good performance is very im- 
portant particularly if a variational wave function with 
several parameters is used. Due to this, SGA is very 
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FIG. 1. Typical configurations measured by the correlation 
factors in the Gutzwiller-type wave function. For example, Cs 
counts the number of electron pairs that have opposite spin 
electrons differing in both coordinates by one. 
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FIG. 2. Energy as a function of the Gutzwiller parameter 
g. The solid line shows a fourth-order polynomial fit. The 
inset shows the deviations of the local energy, defined as a 
and a in the text, marked with V and '+', respectively. In 
a, Et = —280.5 has been used. 
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FIG. 3. The Gutzwiller parameter gi for the optimization 
step i. The solid (dashed) line corresponds to the variance 
(energy) minimization, respectively. Both simulations are 
started from go = 0.65. 
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FIG. 4. The Gutzwiller parameter gi for the first 200 opti- 
mization steps i using the analytic calculation of the gradient. 
Simulation is started from go = 0.65. The solid line show the 
optimal parameter value, and the dotted ones show a range 
of parameter value that gives results with energy error within 
the statistical uncertainty of the Fig. ^. The fluctuation of 
the parameter value is much smaller than in Fig. H. 



TABLE I. Energy of 101 + 101 electrons on a 16 x 16 lattice 
with [7 = 4 for different methods. Numbers in the parenthesis 
are statistical errors in energy. The number of VMC param- 
eters is also given. The DMC energy is from Ref. M and 
CPMC from Ref. 



Method 


Energy 


VMC 1 


-280.3(1) 


VMC 4 


-281.4(1) 


DMC 


-283.0(5) 


CPMC 


-286.55(8) 
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